Examining heterogeneity

We had discussed wanting to have a quantifiable method for identifying those states that follow different trends from the rest. My first go at doing this involved taking the results of the COD decompositions over time for each state by sex and creating a linear model of the COD decompositions (Y) as a function of year and fixed effects for each state. I modelled year using a linear spline with 3 interior knots spaced equally. While this sounds rough, you’ll see that it adequately captures the trends for most states. We can also consider placing the knots at pre-specified years (say 1983, 1993). The first plot illustrates the observed COD contribution (teal) vs. the estimated contribution (pink) and is ordered by how close these trends overlap.

I then calculated the residual (observed COD contribution - estimated COD contribution) at each year and the next plot shows the residuals vs. year. The sum of the absolute residuals is displayed for each state, such that the states with the highest absolute residuals are most different from the average trend. I admit that this is a rough way to do this – it considered all states as having equal value (we could inverse weight by length of the CI for a start), but I think it allows us to capture which states are different from the average trend and speak to those. I also like that it doesn’t abritrarily create a cut-off that classifies states as “the same” and “different”.

#run 12 stratified models. bs() created the b-spline basis
models <- cod_decomp_results %>% 
  group_by(COD, sex) %>%
  do(fit = lm(COD_cont_yrs_mean ~ bs(year, degree = 1, df = 5) + state, 
              weights = 1/(COD_cont_yrs_SE)^2, 
              data = .)) 

#calculated predicted values for all models
dfPred = broom::augment(models, fit)
dfPred$year <- 1969:2013

#merge the predictions with original dataset
cod_decomp_results <- merge(cod_decomp_results, 
                            dfPred %>% select(COD, sex, year, state, .fitted, .resid), 
                            by = c("COD", "sex", "state", "year"))

#calculated the sum of the residuals for each state-sex-cod strata (summed over years)
cod_decomp_results <- cod_decomp_results %>% 
  group_by(state, sex, COD) %>%
  mutate(sum.residual = sum(abs(.resid)))